Comparing methods for automatic acquisition of Topic Signatures
نویسندگان
چکیده
The main goal of this work is to compare two methods for building Topic Signatures, which are vectors of weighted words acquired from large corpora. We used two different software tools, ExRetriever and Infomap, for acquiring Topic Signatures from corpus. Using these tools, we retrieve sense examples from large text collections. Both systems construct a query for each word sense using WordNet. The quality of the acquired Topic Signatures is indirectly evaluated on the Word Sense Disambiguation English Lexical Task of Senseval-2. keywords: Topic Signatures, acquisition, Latent Semantic Indexing, Word Sense Disambiguation, Multilingual Central Repository, WordNet.
منابع مشابه
The Automated Acquisition of Topic Signatures for Text Summarization
In order to produce a good summary, one has to identify the most relevant portions of a given text. We describe in this paper a method for automatically training topic signatures{sets of related words, with associated weights, organized around head topics{and illustrate with signatures we created with 6,194 TREC collection texts over 4 selected topics. We describe the possible integration of to...
متن کاملتولید خودکار الگوهای نفوذ جدید با استفاده از طبقهبندهای تک کلاسی و روشهای یادگیری استقرایی
In this paper, we propose an approach for automatic generation of novel intrusion signatures. This approach can be used in the signature-based Network Intrusion Detection Systems (NIDSs) and for the automation of the process of intrusion detection in these systems. In the proposed approach, first, by using several one-class classifiers, the profile of the normal network traffic is established. ...
متن کاملAutomatic Acquisition of English Topic Signatures Based on a Second Language
We present a novel approach for automatically acquiring English topic signatures. Given a particular concept, or word sense, a topic signature is a set of words that tend to co-occur with it. Topic signatures can be useful in a number of Natural Language Processing (NLP) applications, such as Word Sense Disambiguation (WSD) and Text Summarisation. Our method takes advantage of the different way...
متن کاملAn Empirical Study for the Automatic Acquisition of Topic Signatures
The main goal of this work is to compare different methods for building Topic Signatures, which are vectors of weighted words acquired from large corpora. We used two different software tools, ExRetriever [Cuadros et al., 2004] and Infomap [Dorow and Widdows, 2003], for acquiring Topic Signatures from corpus. Using these tools, we retrieve sense examples from large text collections. We also inc...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کامل